Code Similarity Comparison of Multiple Source Trees
نویسنده
چکیده
This paper outlines the design of a code comparison tool, ctcompare, which use short sequences of lexical tokens from source code as a key in an inverted index to perform the code comparison. This technique allows the comparison of multiple source code trees simultaneously. Other significant features of the tool include the definition of a serialised token stream format which allows the independent analysis of a source tree without revealing the full source code, and isomorphic code comparison to identify renamed identifiers.
منابع مشابه
Code Similarity Detection in Multiple Large Source Trees using Token Hashes
The ability to find similarities between two source code bases, or within one code base, has many uses including the detection of student plagiarism, the identification of intellectual property violations and the location of repeated code in a code base amenable to refactoring. Previous structure-metric approaches have used either suffix trees or modified Longest Common Subsequence algorithms t...
متن کاملTOPD/FMTS: a new software to compare phylogenetic trees
SUMMARY TOPD/FMTS has been developed to evaluate similarities and differences between phylogenetic trees. The software implements several new algorithms (including the Disagree method that returns the taxa, that disagree between two trees and the Nodal method that compares two trees using nodal information) and several previously described methods (such as the Partition method, Triplets or Quar...
متن کاملDeep Learning Similarities from Different Representations of Source Code
Assessing the similarity between code components plays a pivotal role in a number of Software Engineering (SE) tasks, such as clone detection, impact analysis, refactoring, etc. Code similarity is generally measured by relying on manually defined or hand-crafted features, e.g., by analyzing the overlap among identifiers or comparing the Abstract Syntax Trees of two code components. These featur...
متن کاملA Comparison of Similarity Techniques for Detecting Source Code Plagiarism
Academic dishonesty is a universal problem. Detecting duplicated text among natural language artifacts is a welldocumented task. However, performing similar analysis on source code presents unique problems. In this paper, I present a comparison of the application of various techniques in textual similarity processing on source code. Beyond this, I investigate the application of textual similari...
متن کاملPinda: A Web service for detection and analysis of intraspecies gene duplication events
We present Pinda, a Web service for the detection and analysis of possible duplications of a given protein or DNA sequence within a source species. Pinda fully automates the whole gene duplication detection procedure, from performing the initial similarity searches, to generating the multiple sequence alignments and the corresponding phylogenetic trees, to bootstrapping the trees and producing ...
متن کامل